Tags: llm* + prompt engineering*

0 bookmark(s) - Sort by: Date ↓ / Title /

  1. LLM EvalKit is a streamlined framework that helps developers design, test, and refine prompt‑engineering pipelines for Large Language Models (LLMs). It encompasses prompt management, dataset handling, evaluation, and automated optimization, all wrapped in a Streamlit web UI.

    Key capabilities:

    | Stage | What it does | Typical workflow |
    |-------|-------------|------------------|
    | **Prompt Management** | Create, edit, version, and test prompts (name, text, model, system instructions). | Define a prompt, load/edit existing ones, run quick generation tests, and maintain version history. |
    | **Dataset Creation** | Organize data for evaluation. Loads CSV, JSON, JSONL files into GCS buckets. | Create dataset folders, upload files, preview items. |
    | **Evaluation** | Run model‑based or human‑in‑the‑loop metrics; compare outcomes across prompt versions. | Choose prompt + dataset, generate responses, score with metrics like “question‑answering‑quality”, save baseline results to a leaderboard. |
    | **Optimization** | Leveraging Vertex AI’s prompt‑optimization job to automatically search for better prompts. | Configure job (model, dataset, prompt), launch, and monitor training in Vertex AI console. |
    | **Results & Records** | Visualize optimization outcomes, compare versions, and maintain a record of performance over time. | View leaderboard, select best optimized prompt, paste new instructions, re‑evaluate, and track progress. |

    **Getting Started**

    1. Clone the repo, set up a virtual environment, install dependencies, and run `streamlit run index.py`.
    2. Configure `src/.env` with `BUCKET_NAME` and `PROJECT_ID`.
    3. Use the UI to create/edit prompts, datasets, and launch evaluations/optimizations as described in the tutorial steps.

    **Token Use‑Case**

    - **Prompt**: “Problem: {{query}}nImage: {{image}} @@@image/jpegnAnswer: {{target}}”
    - **Example input JSON**: query, choices, image URL, target answer.
    - **Model**: `gemini-2.0-flash-001`.

    **License** – Apache 2.0.
  2. This article explores how prompt engineering can be used to improve time-series analysis with Large Language Models (LLMs), covering core strategies, preprocessing, anomaly detection, and feature engineering. It provides practical prompts and examples for various tasks.
  3. This article explores strategies for effectively curating and managing the context that powers AI agents, discussing the shift from prompt engineering to context engineering and techniques for optimizing context usage in LLMs.
  4. This article introduces the LLM Function Design Pattern, a structured approach to building AI-powered software. It addresses the challenges of integrating Large Language Models (LLMs) into applications by outlining a pattern that promotes modularity, testability, and maintainability. The pattern involves defining clear functions with specific inputs and outputs, and then leveraging LLMs to implement the core logic within those functions.
    2025-10-03 Tags: , , by klotz
  5. This guide offers five essential tips for writing effective GitHub Copilot custom instructions, covering project overview, tech stack, coding guidelines, structure, and resources, to help developers get better code suggestions.
  6. This article discusses the concept of 'tool masking' as a way to optimize the interaction between LLMs and APIs, arguing that simply exposing all API functionality (as done by MCP) is inefficient and degrades performance. It proposes shaping the tool surface to match the specific use case, improving accuracy, cost, and latency.
  7. This article provides a practical guide to JSON prompting for Large Language Models (LLMs), demonstrating how structuring prompts with JSON improves consistency, accuracy, and scalability. It includes Python coding examples comparing free-form and JSON prompts, and provides access to full code notebooks.
    2025-08-27 Tags: , , , , by klotz
  8. Security researchers have discovered that LLM chatbots can be made to ignore their guardrails by using prompts with terrible grammar and long, run-on sentences. This bypasses safety filters and allows the model to generate harmful responses. The research introduces the 'refusal-affirmation logit gap' as a metric for assessing model vulnerability.
    2025-08-26 Tags: , by klotz
  9. GitHub Models is a suite of developer tools for AI development, offering features like prompt management, model comparison, and quantitative evaluations, integrated directly into GitHub workflows.
    2025-06-06 Tags: , , , by klotz
  10. A post with pithy observations and clear conclusions from building complex LLM workflows, covering topics like prompt chaining, data structuring, model limitations, and fine-tuning strategies.

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: tagged with "llm+prompt engineering"

About - Propulsed by SemanticScuttle